A Model-Driven Probabilistic Parser Generator

نویسندگان

  • Luis Quesada
  • Fernando Berzal Galiano
  • Francisco J. Cortijo
چکیده

Existing probabilistic scanners and parsers impose hard constraints on the way lexical and syntactic ambiguities can be resolved. Furthermore, traditional grammar-based parsing tools are limited in the mechanisms they allow for taking context into account. In this paper, we propose a model-driven tool that allows for statistical language models with arbitrary probability estimators. Our work on model-driven probabilistic parsing is built on top of ModelCC, a modelbased parser generator, and enables the probabilistic interpretation and resolution of anaphoric, cataphoric, and recursive references in the disambiguation of abstract syntax graphs. In order to prove the expression power of ModelCC, we describe the design of a general-purpose natural language parser.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

Parser-Based Retraining for Domain Adaptation of Probabilistic Generators

While the effect of domain variation on Penntreebank-trained probabilistic parsers has been investigated in previous work, we study its effect on a Penn-Treebank-trained probabilistic generator. We show that applying the generator to data from the British National Corpus results in a performance drop (from a BLEU score of 0.66 on the standard WSJ test set to a BLEU score of 0.54 on our BNC test...

متن کامل

Re-engineering Eclipse MDT/OCL for Xtext

The current tooling used for the Eclipse OCL project uses an LALR parser generator. Enhancing the tooling to support editing motivated a migration to exploit the inherently model-driven characteristics of Xtext. This paper summarizes the experiences of that migration, identifies the many benefits and discusses a few changes in implementation approach that were required. Objective performance an...

متن کامل

Adapting a Probabilistic Disambiguation Model of an HPSG Parser to a New Domain

This paper describes a method of adapting a domain-independent HPSG parser to a biomedical domain. Without modifying the grammar and the probabilistic model of the original HPSG parser, we develop a log-linear model with additional features on a treebank of the biomedical domain. Since the treebank of the target domain is limited, we need to exploit an original disambiguation model that was tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1205.3183  شماره 

صفحات  -

تاریخ انتشار 2012